Skip to content

Conversation

alwa-nordic
Copy link
Contributor

When bt_l2cap_send_pdu() succeeds, it transfers buffer ownership to the stack, which must eventually invoke the provided callback. This contract is honored in all paths where transmission becomes impossible:

  • Normal transmission: callback invoked with err=0 after HCI Number of Completed Packets event (tx_notify_process)
  • Send errors (after tx allocated): callback invoked with err=-ESHUTDOWN via conn_tx_destroy
  • Send errors (before tx allocated): callback invoked with the specific error code in send_buf error_return path
  • Connection disconnect: callbacks invoked with err=-ESHUTDOWN via process_unack_tx -> conn_tx_destroy for all PDUs in tx_pending

However, when a channel is deleted (l2cap_chan_del), PDUs remaining in the tx_queue are dropped without invoking their callbacks, violating the ownership contract.

Fix this by extracting and invoking any non-NULL callbacks from the closure stored in buf->user_data before releasing the buffers. The callback is invoked with err=-ESHUTDOWN, making this path analogous to process_unack_tx: both drain queues of unsent PDUs when transmission becomes impossible due to external events (channel deletion vs connection disconnect). The only difference is the buffer lifecycle stage - in l2cap_chan_del, PDUs are still in tx_queue (closure in buf->user_data), while in process_unack_tx, they've progressed to tx_pending (callback in bt_conn_tx struct).

Note: conn_tx_destroy() cannot be used here because no bt_conn_tx struct has been allocated yet - the closure is still in buf->user_data.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a contract violation in the Bluetooth L2CAP layer where transmission callbacks were not being invoked when channels are deleted, leaving callers without notification that their buffers will never be transmitted.

  • Added callback extraction and invocation before buffer cleanup in bt_l2cap_chan_del()
  • Callbacks are invoked with -ESHUTDOWN error code to indicate transmission failure
  • Maintains consistency with other transmission failure paths in the codebase

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@alwa-nordic alwa-nordic marked this pull request as ready for review October 7, 2025 07:57
@zephyrbot zephyrbot added area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth labels Oct 7, 2025
@alwa-nordic
Copy link
Contributor Author

The twister failures are unrelated. It seems some python package is missing in CI.

@Thalley
Copy link
Contributor

Thalley commented Oct 7, 2025

The twister failures are unrelated. It seems some python package is missing in CI.

That issue should be fixed by now - Has this PR been rebased recently?

Comment on lines 269 to 294
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible / make sense for this to call conn_tx_destroy in some way?

It seemingly does the same, and would help ensure that -ESHUTDOWN is used in all cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to structure this.

We cannot use conn_tx_destroy as is because we don't have a bt_conn_tx yet. But we also shouldn't if we want to keep the layers separate. Both bt_conn_tx and conn_tx_destroy are internal to conn. Maybe I should not have mentioned it in the code comment given that.

I think it's better to think of it like this: It's always l2cap that decides the callback should be invoked with -ESHUTDOWN. Architecturally, conn should destroy its internal bt_conn_tx and tell l2cap (passing back some l2cap user data) and l2cap should then decide to invoke the higher up callback with -ESHUTDOWN. It's just a kind of optimization that we end up passing the upper layer callback to conn.

I would like to re-architecture the conn l2cap interface to get rid of the arbitrary callback passed to conn. Instead conn could just signal l2cap that a particular connection got a num complete and l2cap should know which channel that was and the channel should know what action to take. But I'm not going to do this now.

When bt_l2cap_send_pdu() succeeds, it transfers buffer ownership to the
stack, which must eventually invoke the provided callback. This contract
is honored in all paths where transmission becomes impossible:

- Normal transmission: callback invoked with err=0 after HCI Number of
  Completed Packets event (tx_notify_process)
- Send errors (after tx allocated): callback invoked with err=-ESHUTDOWN
  via conn_tx_destroy
- Send errors (before tx allocated): callback invoked with the specific
  error code in send_buf error_return path
- Connection disconnect: callbacks invoked with err=-ESHUTDOWN via
  process_unack_tx -> conn_tx_destroy for all PDUs in tx_pending

However, when a channel is deleted (l2cap_chan_del), PDUs remaining in
the tx_queue are dropped without invoking their callbacks, violating the
ownership contract.

Fix this by extracting and invoking any non-NULL callbacks from the
closure stored in buf->user_data before releasing the buffers. The
callback is invoked with err=-ESHUTDOWN, making this path analogous to
process_unack_tx: both drain queues of unsent PDUs when transmission
becomes impossible due to external events (channel deletion vs connection
disconnect). The only difference is the buffer lifecycle stage - in
l2cap_chan_del, PDUs are still in tx_queue (closure in buf->user_data),
while in process_unack_tx, they've progressed to tx_pending (callback in
bt_conn_tx struct).

Note: conn_tx_destroy() cannot be used here because no bt_conn_tx struct
has been allocated yet - the closure is still in buf->user_data.

Signed-off-by: Aleksander Wasaznik <[email protected]>
@alwa-nordic alwa-nordic force-pushed the l2cap_chan_del-ud-leak branch from d4d735f to cace4a9 Compare October 7, 2025 09:13
@alwa-nordic
Copy link
Contributor Author

Rebased. Maybe it's needed to fix CI.

Copy link

sonarqubecloud bot commented Oct 7, 2025

Copy link
Contributor

@MarekPieta MarekPieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this one. I confirmed locally that the fix works on my end

while ((buf = k_fifo_get(&le_chan->tx_queue, K_NO_WAIT))) {
bt_conn_tx_cb_t cb = closure_cb(buf->user_data);

if (cb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cb) {
if (cb != NULL) {

https://docs.zephyrproject.org/latest/contribute/coding_guidelines/index.html rule 85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants